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Abstract. - The promise of punishment and reward in promoting public cooperation is debatable. 
While punishment is traditionally considered more successful than reward, the fact that the cost of 
punishment frequently fails to offset gains from enhanced cooperation has lead some to reconsider 
reward as the main catalyst behind collaborative efforts. Here we elaborate on the "stick versus 
carrot" dilemma by studying the evolution of cooperation in the spatial public goods game, where 
besides the traditional cooperators and defectors, rewarding cooperators supplement the array of 
possible strategies. The latter are willing to reward cooperative actions at a personal cost, thus 
effectively downgrading pure cooperators to second-order free-riders due to their unwillingness 
to bear these additional costs. Consequently, we find that defection remains viable, especially if 
the rewarding is costly. Rewards, however, can promote cooperation, especially if the synergetic 
effects of cooperation are low. Surprisingly, moderate rewards may promote cooperation better 
than high rewards, which is due to the spontaneous emergence of cyclic dominance between the 
three strategies. 
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Introduction. — Sustainable development and intact 
social stability require collaborative efforts. Although self- 
ishness and competitiveness are an inherent part of hu- 
man nature, field studies and experiments attest to the 
fact that humans are willing to cooperate if the conditions 
are right pQ. Failure to do so results in the exploitation 
of public goods, such as environmental resources or social 
benefits, by defectors, who in doing so reap benefits on the 
expense of cooperators. The "tragedy of the commons" 
succinctly describes such a situation [2J. In pairwise in- 
teractions reciprocation can work in favor of cooperation 
[3H5]. If more than two persons are involved, however, to 
reciprocate becomes challenging and the burden of sus- 
taining cooperation often falls on punishment [6-9], as 
reviewed comprehensively in [TU]. The Achilles' heel of 
punishment is the fact that it is costly, and it is therefore 
not clear how it emerges and how to stabilize it. Those 
that contribute to the common good but abstain from pun- 
ishing wrongdoers are "second-order free-riders" , who, in 
the absence of additional incentives aimed at sustaining 
punishment, prevail and thus eliminate the threat of sanc- 
tioning [11H13) . For this unfortunate scenario to unravel, 
it has recently been suggested that punishment should be 
a coordinated act HH. It has also been shown that the 



network reciprocity in structured populations alone may 
be sufficient to solve the second-order free-rider problem 
[THUS], and pool-punishment has been considered as an 
alternative to the traditionally employed peer-punishment 
with remarkable success [T7]. Nevertheless, studies criti- 
cally probing the effectiveness of punishment in sustaining 
cooperation, for example in conjunction with anti-social 
punishment [18] ■ indirect reciprocity [19], or unfair sanc- 
tions [20], are an important reminder of open questions 
still imbuing the subject. 

Reward is an established alternative to punishment 
[21][22], albeit studied less frequently in the past. While 
punishment implies paying a cost for another person to 
incur a cost, rewards obviously incorporate a cost to bear 
too, but for another person to experience a benefit. The 
majority of previous studies addressing the "stick versus 
carrot" dilemma concluded that punishment is more ef- 
fective than reward in sustaining public cooperation [10 . 
But as pointed out in a recent paper by Rand et al. [23 , 
most of these studies disregarded future consequences for 
today's actions. Indeed, reputation is key [24] and repre- 
sents a precious asset to loose over an act of punishment 
that may or may not help in reverting the punished in- 
dividual. Rewarding is in this respect a safer alternative, 
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and as concluded in [23] , may be as effective as punishment 
for maintaining public cooperation. Inspired by these ex- 
perimental findings, we here investigate the impact of re- 
ward on the evolution of cooperation in the spatial pub- 
lic goods game by means of an additional third strategy. 
The so-called rewarding cooperators, i.e. cooperators that 
reward other cooperators, are willing to bear additional 
costs in order to reward those that contribute to the com- 
mon good. As by the introduction of costly punishment, 
the traditional cooperators, i.e. those that contribute to 
the common good but do not reward other cooperators, 
become second-order free- riders that fiercely challenge the 
proliferation of rewarding cooperators. We come to inter- 
esting and partly counterintuitive conclusions that go well 
with existing studies on punishment in structured popu- 
lations [TSJUniHSlHl] > as wen as supplement the array of 
other mechanisms, such as voluntary participation [2"7] . 
social [S5J and group [33] diversity, random exploration 
of strategies 30 , or similar additions [3TH34] . that can 
be associated with the promotion of cooperation in public 
goods games. 

Public goods game with reward. The public 
goods game is staged on a square lattice with periodic 
boundary conditions, whereon initially each player on site 
x is designated either as a cooperator (s x = C), defector 
(s x = D), or rewarding cooperator (s x = RC), with equal 
probability. Players play the game with their k = 4 neigh- 
bors. Accordingly, each individual belongs to five different 
groups, i.e. it is the focal individual of a Moore neighbor- 
hood and a member of the Moore neighborhood of its four 
nearest neighbors. 

Using standard parametrization, the two cooperating 
strategies (C and RC) contribute 1 to the public good 
while defectors contribute nothing. The sum of all con- 
tributions is multiplied by the factor r > 1, reflecting the 
synergetic effects of cooperation, and the resulting amount 
is subsequently equally shared among the k + 1 interact- 
ing individuals irrespective of their strategies. In addi- 
tion, here each cooperator (C and RC) receives the reward 
/3/k from every rewarding cooperator that is a member of 
the focal group, and every rewarding cooperator from this 
group therefore bears an additional cost j/k, thus leading 
to different payoffs of Cs and RCs. Denoting the number 
of Cs, Ds, and RCs among the k interaction partners by 
Ac, Ad, and Arc, respectively, each cooperator gets 

Pc = r{N c + A RC + l)/(fc + 1) - 1 + (3(N KC )/k , (1) 

a defector receives 

Pd =r(A c + A RC )/(fc + l) , (2) 

while every rewarding cooperator acquires 

Prc =Pc-l(Nc + N RC )/k . (3) 

It is worth pointing out that the cost 7 and reward (3 
are not necessarily identical. This is easy to justify with 



realistic examples. To praise someone hardly costs any- 
thing, yet it may do wonders for the recipient. On the 
other hand, an affectionate spouse can spend a small for- 
tune on a dress for the partner, only to be later ridiculed 
for bad taste. While not necessarily representative, we 
believe these two simple examples suffice to justify the 
introduction of two rather than a single parameter in or- 
der to examine the impact of reward thoroughly, with all 
its subtleties. We also point out that (3 and 7 are intro- 
duced normalized with the number of neighboring players 
k in each group in order to facilitate comparisons with 
results obtained on other interaction graphs or by using 
differently sized groups. Moreover, the values of /3 and 
7 then represent maximally attainable values within each 
group and the setup is directly comparable with the pre- 
viously studied punishment model [T5j . As was reported 
in [ini[3S], here it holds too that the presented results are 
robust to reasonable variations of the underlying network 
structure and group size. 

After the three strategies on the L 2 square lattice are 
distributed uniformly at random, a random sequential up- 
date with the following elementary steps is performed. 
First, a randomly selected player x plays the public goods 
game with the k interaction partners of a group g, and 
obtains a payoff P§ from all k + 1 = 5 groups it belongs 
to. The overall payoff is thus P x = ^Z g Pi- Next, one of 
the four nearest neighbors of player x is chosen randomly, 
and its location is denoted by y. Player y also acquires 
its payoff P y identically as previously player x. Finally, 
player y imitates the strategy of player x with the proba- 
bility q = 1/{1 + cxp[(_P y — P x )/K]}, where K determines 
the level of uncertainty by strategy adoptions [36] . With- 
out the loss of generality we set K = 0.5, implying that 
better performing players are readily imitated, but it is 
not impossible to adopt the strategy of a player perform- 
ing worse. Such errors in judgment can be attributed to 
mistakes and external influences that affect the evaluation 
of the opponent. Each full Monte Carlo step involves all 
players having a chance to adopt a strategy from one of 
their neighbors once on average. Depending on the typi- 
cal size of emerging spatial patterns, the linear system size 
was varied from L = 400 — 5000 in order to avoid finite 
size effects, and the equilibration required up to 10 full 
Monte Carlo steps (MCS). 

Results. — In the absence of reward, cooperators sur- 
vive only if r > 3.74 and crowd out defectors completely 
for r > 5.49 if using the square lattice as the interac- 
tion graph [35] ■ These can be used as benchmark values 
for evaluating the impact of reward on the evolution of 
cooperation in structured populations. Accordingly, we 
will focus on three different values of the synergy factor 
r, namely 2.0, 3.5 and 4.4, being representative for low, 
intermediate and high synergetic effects of cooperation, 
respectively. Hereafter, we will thus systematically exam- 
ine how different combinations of reward (the benefit the 
recipient experiences upon being rewarded) /3 and cost (of 
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Fig. 1: (a) Full reward-cost phase diagram obtained for the 
synergy factor r = 2.0. Different phases are denoted by the 
symbols of the strategies that survive in the final strategy dis- 
tribution. Solid blue lines indicate continuous phase transi- 
tions. A typical cross-section of the phase diagram at the cost 
7 = 0.01 is shown in panel (b), depicting the fraction of co- 
operators pc, defectors pr> and rewarding cooperators prc in 
dependence on the reward fi. 



giving the reward) 7 affect the survivability of the three 
strategies on the square lattice. 

We start with the low r limit, thus setting r — 2.0. 
Figure [IJa) features the full reward-cost phase diagram, 
where it can be observed that the pure D phase first gives 
way to a very narrow region of coexistence of D+RC and 
shortly thereafter reaches the pure RC phase as the reward 
increases. The blue transition lines, indicating continu- 
ous second-order phase transitions, lean towards higher 
rewards for larger costs, yet this effect is expected and val- 
idates the behavior of the examined model. Most remark- 
able is the reappearance of defectors in a stable D+C+RC 
phase if the reward is increased further, thus giving rise to 
a stable coexistence of all three strategies. Finally, if the 
reward is higher still and the costs remain moderate (note 
that the slope of the rightmost transition line is consider- 
ably larger), defectors again die out and leave C and RC 
as the only remaining strategies. Notably, here C and RC 
are not equivalent strategies as was the case in a recently 
studied punishment model [TSJ[T5], and thus their stable 
coexistence is possible. 

Turning to the reappearance of defectors for intermedi- 
ate rewards, we show in Fig. QTb) a characteristic cross- 
section of the phase diagram obtained for 7 = 0.01. In 
agreement with the four blue lines depicted in Fig. [lja), 
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Fig. 2: Characteristic snapshots of a 100 x 100 square lattice 
with specially prepared initial conditions (see main text for 
details). Colors red, green and blue depict the location of de- 
fectors (D), rewarding cooperators (RC) and cooperators (C), 
respectively. The snapshots were taken at (a), 140 (b), 560 (c) 
and 600 (d) full MCS, and the parameter values were r = 2.0, 
7 = 0.05 and /3 = 0.9. 



we can observe four continuous phase transitions. From 
left to right we have, first, the emergence of rewarding 
cooperators (prc > 0), which is quickly followed by the 
extinction of defectors (/?d = 0). Subsequently, defectors 
(D) reaper with pure cooperators (C) to form the coexis- 
tence of all three strategies, and finally, at (3 0.873 de- 
fectors die out again. Interpreting these observations, for 
sufficiently large /3 the rewarding cooperators can support 
each other and protect themselves against the invasion 
of defectors. In accordance with the well-known network 
reciprocity mechanism, rewarding cooperators aggregate 
into compact clusters with a smooth interface (not shown 
here). At still higher /3, the efficiency of rewarding cooper- 
ators is so strong that defectors cannot survive. Remark- 
ably, for j3 > 0.775 the support of cooperative actions be- 
comes powerful enough to enable not just the proliferation 
of rewarding cooperators (RC), but also the survivability 
of pure cooperators (C). But since the synergy factor is 
low (r=2.0), the pure cooperators are susceptible to ex- 
ploitation by defectors and can therefore survive only in 
the vicinity of rewarding cooperators. Nevertheless, the 
emergence of pure cooperators simultaneously enables also 
the survivability of defectors via a stable D+C+RC phase 
that is governed by cyclic dominance. 

The workings of this cyclic dominance can be demon- 
strated by examining the snapshots of strategy distribu- 
tions. Figure (Ha) depicts a prepared initial state, where- 
after the movements of the boundaries that separate the 
three strategies give vital insight into the dominance be- 
tween them. Due to the small synergy factor r, the de- 
fectors (red) can easily invade the blue region of pure co- 
operators. Simultaneously, since the reward is large, re- 
warding cooperators (green) can outperform defectors. In 
the midst of rewarding cooperators, however, pure coop- 
erators (blue) can spread as well because they enjoy the 
significant benefits of reward but do not bear any costs. 
But as soon as some of the pure cooperators depart from 
the safe haven of rewarding cooperators, the whole circle 
of invasion starts anew, leading to an uprise of defectors 
(red), who are then again conquered by rewarding coop- 
erators, who then again foster the spreading of pure coop- 
erators, and so on. Clearly thus, the three strategies form 
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a closed loop of dominance, which can be observed nicely 
if following the snapshots presented in Fig. [5] from left to 
right (qualitatively identical spatial patterns emerge from 
random initial conditions if the system size is sufficiently 
large). It is worth emphasizing that if one of the three 
strategies dies out by chance due to a small system size, 
the balance within the closed loop of dominance is bro- 
ken, and accordingly, one of the remaining two strategies 
spreads across the whole grid. To avoid this, it is therefore 
paramount to use sufficiently large system sizes. Interest- 
ingly, the stationary density of defectors is considerable, 
but the increase of the pt>(P) function is the more dra- 
matic the larger the cost of reward 7. This is in agreement 
with the behavior of predator-prey systems where the di- 
rect support of prey will ultimately be beneficial for the 
predators. Naturally, if the reward (3 is even larger, defec- 
tors cannot survive and the system arrives to the mixed 
C+RC phase, as depicted in Figs. QJa) and (b). Note 
that the qualitative behavior thereafter does not change 
and the fraction of cooperators and rewarding cooperators 
converges to a nonzero value. This, however, is a unique 
consequence of the spatial structure since in well-mixed 
populations cooperators (C), i.e. second-order free- riders, 
clearly perform better than rewarding cooperators (RC) 
and should thus become dominant. In fact, the mechanism 
that allows rewarding cooperators to survive in the sea of 
second-order free-riders is identical to the one revealed by 
Nowak and May when studying the two-strategy spatial 
prisoner's dilemma game on the square lattice [37]. In our 
case RCs also form compact clusters that allow them to 
survive the competition with the superior pure coopera- 
tors. 

To explore the robustness of our observations obtained 
for the small synergy factor r — 2.0, we study the evolu- 
tion of cooperation also for the intermediate value r = 3.5, 
although this still results in a pure D phase in the ab- 
sence of reward [35]. From the reward-cost phase dia- 
gram presented in Fig. [3](a) it follows that the qualitative 
features, if compared to Fig. [IJa), remain largely intact. 
The most significant change is the expansion of the mixed 
D+RC phase, ultimately leading to the disappearance of 
the pure RC phase. Note that the stable coexistence of 
RCs in the sea of Ds is, similarly as the C+RC phase, 
due to spatial reciprocity [37) . allowing the inferior strat- 
egy (as obtained for well-mixed populations) to survive by 
means of clustering. On the other hand, the coexistence 
phase containing all three strategies, along with the cyclic 
dominance between them, is fully preserved. In fact, the 
D+C+RC region is expanded too, which is further coun- 
terintuitive in view of the larger synergy factor used. The 
latter, of course, promotes cooperation, and should thus 
act detrimental rather than positive on the survivability 
of defectors. 

Figure [3jb) supports this surprising outcome from a 
quantitative perspective. Indeed, the uprise of defectors, 
going up to — 0.19, is significantly stronger than what 
was observed for r = 2.0 in Fig. [ljb). The reason for this, 
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Fig. 3: (a) Full reward-cost phase diagram obtained for the 
synergy factor r = 3.5 (phases are denoted by the symbols of 
surviving strategies). Solid blue lines indicate continuous phase 
transitions. A typical cross-section of the phase diagram at the 
cost 7 = 0.05 is shown in panel (b), depicting the fraction of 
cooperators pc, defectors pr> and rewarding cooperators prc 
in dependence on the reward (5. 

we argue, is the fact that larger values of r support both, 
the rewarding as well as pure cooperators. The larger 
abundance of pure cooperators in particular, gives the de- 
fectors more opportunities to conquer lost ground from 
rewarding cooperators. Note that in the absence of re- 
ward r — 3.5 still fails to sustain cooperative behavior. 
Accordingly, the strength of dominance within the closed 
D — > C — > RC — > D loop unexpectedly shifts in favor of 
defectors, which is again an exemplification of how the 
support of prey ultimately benefits the predator. Fur- 
thermore, although not surprisingly, it can be observed 
that the transition lines in Fig. [3](a) and the correspond- 
ing phase transitions in Fig. [3f b) are altogether shifted to 
significantly lower values of /3, which is expected since the 
synergy factor alone provides a better support for the two 
cooperative strategies. The emergence of rewarding co- 
operators, and subsequently also of pure cooperators and 
defectors by means of cyclic dominance, can thus be war- 
ranted already by substantially lower rewards. 

Lastly, we examine the impact of reward on the evolu- 
tion of cooperation at a high synergy factor, thus setting 
r = 4.4. The reward-cost phase diagram is presented in 
Fig. Ufa). It differs considerably from the previous two, 
predominantly due to the fact that cooperators can, at 
this values of r, be sustained by network reciprocity alone. 
Accordingly, the pure D phase is missing and the pv>(P) 
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Fig. 5: Time evolution of strategy densities as obtained for 
r = 4.4, 7 = 0.001, j3 = 0.003 (a) and /3 = 0.004 (b). The 
fraction of defectors is plotted solid red, while the fractions of 
pure and rewarding cooperators are depicted by dotted blue 
and dashed green lines, respectively. Note the opposite time 
evolution of the two cooperative strategies that is induced by a 
minute change in the hight of the reward /3, taking the system 
from one side of a discontinuous phase transition to the other. 



Fig. 4: (a) Full reward-cost phase diagram obtained for the 
synergy factor r — 4.4 (phases are denoted by the symbols 
of surviving strategies). Solid blue (dotted red) lines indicate 
continuous (discontinuous) phase transitions. A typical cross- 
section of the phase diagram at the cost 7 = 0.01 is shown in 
panel (b), depicting the fraction of cooperators pc, defectors 
Pd and rewarding cooperators prc in dependence on j3. 

function is monotonously decreasing, as can be observed 
from Fig. @|b). The existence of the three-strategy phase 
is also constrained to a significantly smaller portion of 
the P — 7 parameter plane. Substantial benefits of col- 
laborative efforts thus work clearly in favor of the two 
cooperative strategies, which become the main aspirants 
for supremacy on the spatial grid. The perseverance of 
defectors, going extinct only if (3 > 0.58, is nevertheless 
remarkable. 

Results presented in Fig. HJb) allow for an accurate ex- 
amination of the competition between pure (C) and re- 
warding (RC) cooperators. Unlike for small and inter- 
mediate values of r, we can here observe a discontinuous 
phase transition [marked dotted red in Fig.[4][a)], by means 
of which rewarding cooperators first outperform pure co- 
operators. The mechanism behind this transition is iden- 
tical to the one reported recently in the context of pun- 
ishment in structured populations [IB] , and can be sum- 
marized by an indirect territorial battle as follows. Pure 
and rewarding cooperators form homogeneous isolated is- 
lands on the spatial grid and fight independently against 
the defectors. If the reward is high enough the rewarding 
cooperators will be more successful in this than pure coop- 
erators, and accordingly will have an evolutionary advan- 
tage in the stationary state. Conversely, for less favorable 
reward conditions, i.e. if (3 becomes comparable to 7, pure 



cooperators will be more successful in gaining ground from 
defectors, and accordingly, they will prevail. 

These two different evolutionary scenarios can be illus- 
trated nicely by comparing the time courses of strategy 
densities for two different values of the reward, at one and 
the other side of the transition line, respectively. As Fig. [5] 
shows, the fraction of defectors becomes time-independent 
after a short transient and indeed depends only on the 
value of r. The indirect battle between pure and reward- 
ing cooperators starts thereafter and the fractions of these 
two strategies will change oppositely depending on (3. If 
the reward is low, as shown in Fig. [5Ja), pure coopera- 
tors outperform defectors more efficiently and hence crowd 
out also the rewarding cooperators. At the other side 
of the discontinuous phase transition point, for a slightly 
higher value of j3, as shown in Fig.tSfb), the opposite sce- 
nario unfolds, and the system will eventually evolve to a 
D+RC phase. Concluding the study, it can be noted from 
Fig. 0Jb) that as the reward increases further the second- 
order free-riders gradually better the rewarding coopera- 
tors, for the former enjoy the benefits of reward without 
participating in sharing the costs. As defectors die out 
completely the balance shifts again in favor of rewarding 
cooperators by means of the same mechanism that we out- 
lined when described the results presented in FigsQJb) and 

EDO. 

Summary. — We have investigated the impact of re- 
ward on the evolution of cooperation in the spatial public 
goods game. Using the square lattice as the underlying in- 
teraction network, we found that costly rewards facilitate 
cooperation most effectively if the synergetic effects of co- 
operation are low. Surprisingly though, high rewards may 
be less effective in promoting cooperation than moderate 
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rewards. The intricate patterns of cooperation were exam- 
ined systematically by means of phase diagrams, where a 
succession of discontinuous and continuous phase transi- 
tions was found separating the stable coexistence of dif- 
ferent strategies. Depending on the synergy factor and 
the details of rewarding, we have demonstrated the sta- 
ble coexistence of all possible combinations of the three 
strategies. The counterintuitive impact of high rewards in 
particular, was attributed to the spontaneous emergence 
of cyclic dominance between the three strategies, which 
can be molded further by predator-prey-like interactions 
at intermediate synergy factors. Due to the second-order 
free-riding role of traditional cooperators who refuse to 
bear the costs of rewarding, however, it is impossible to 
conclude that rewards in structured populations render 
defection maladaptive. Indeed, defection remains viable 
in considerably large regions of the parameter space, yet 
even for very low synergy factors, properly tuned rewards 
can support cooperation where otherwise defection would 
reign completely. Compared to costly punishment, how- 
ever, the promotion of cooperation by means of costly re- 
wards seems altogether less efficient. Note that in the 
absence of defectors the punishing cooperators become 
equivalent to the cooperators, while rewarding cooperators 
still keep paying the cost of reward and therefore remain 
inferior to the second-order free-riders. Thus, for reward 
to work equally well as punishment, the ratio between the 
benefit and the cost of rewarding must be significantly 
higher than in case of punishment (c/. [15]). At high syn- 
ergy factors, on the other hand, the network reciprocity 
alone suffices to decimate the defectors, and the impact of 
reward is then restricted to establishing the victor between 
traditional cooperators and rewarding cooperators only. 
The two duel each other by means of an indirect territo- 
rial battle against defectors, where the winning strategy 
is the one that is more effective in eliminating defectors. 
If the rewarding is costly the winners are the traditional 
cooperators, but if the benefits of reward offset its costs by 
a comfortable margin then the victors are the rewarding 
cooperators. The border between these two outcomes is a 
discontinuous phase transition. In sum, the rich plethora 
of stable pure and mixed phases as well as intriguing dy- 
namical processes that govern the evolution in the pres- 
ence of rewarding clearly point to the complexity of pos- 
sible solutions in structured populations and strengthen 
their prominent role in the pursuit of cooperation. 
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